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Spatial audio 



Backgroimd 

Prior solutions iaflttdio coders that have been suggeslsd to reduce the bitrate 

of stereo program material include: 

'Jmmty stereo \ In this algoriliim, high ftequendes (typically above 5 Wfe) 
are represented by a single andio signal (Lc, mono), combined vA^ tim^vatying and 
j&equfincy-dependent scalefactors. 

'14/S *r«-eo '.In this algorithm, the signal is decomposed into a sum (or mid, or 
cK«mm)andadi£fer«nce(or^de,ormoammon)M^^ 

combined with principle component analysis or time-varying scalefectors. These ^s are 
Ihencoded independently, either by a transform coder or ^vefonn coder. Tlic amount of 
informationrednctionachievedby Ms algorithm strongly depends on the spaHal properties 
offhe sonrce signal. For example, if the source signal is monaural, the difference signal is 
2^0 and canbe discarded. However, ifthe correlation offhe left andright audio si^ 

low (which is often iSie case). Ibis scheme offers only Htde advantage. 

Parametric descriptions of audio signals have gained interest during ifae last 
years, espedaUyitt the field of audio coding. Ithas been shown that transmi^^ 
parameters that descdbe audio signals requires only little transmission capacity to 
resynfliesfc^ a perceptually equal sigpal at Ihe receiving end. However, current parametnc 
andio coders focus on coding monaural signals, and stereo signals are oto processed as dual 



According to an aspect of the invention, spatiai atmbutes of multichannel 
audio signals are parameterized. It will be shown that for general andio coding appUcations, 
transmitting these parameters combined^th only monaural audio signal will strongly 
reduce the transmission c^aclly necessary to transmit the stereo signal compared to audio 
coders that process Ihe ohamiels independently, while maintaining the original spatial 
inQ,ression. An important issue is that although people receive wavefomis of an anditory 
ol^ecttwiceConce by the left ear andonceby^right ear), onlyasmgle auditory 
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nWects for exanml© amusicalrecording. Has problem m oe ono 
auditory otyects, lor ex<wy«^ describing Ihe spatial 

1 . ^««-RR-etate scale, tlielpandwidHioftee signals depends 

„„fl«cent««a y^^^^^ ^^^^^^ 

Itismteresims ^ ^^i^^ne ^feemaxJffltimiiJteiaT^ 
iiDportairtlooaa2atiott<mesinthehoiia^ 
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conception is strongly related to the perceptual spatial dtffvsemss (or cotapactaess) of a 
sound source. 

It i$ an ii^^t of the inventors that it suf36cient to describe spatial attributes 
of any multicbaiinel audio signal by specifying the ILD, ITD (or IPD) and m a ximum 
5 coirelation as a Junction of time andfrequency. 

An ^bodiment of the euxrent invention aims at d^onbing anaullicbianael 
gndio ^gnal by: 

one monauial signal, consisting of a certain conibinatioa of the i£i|>ut signals, 

aod 

10 a set of spatial paraiaeters: two localization cues (ttD, and ITD or IPD) and a 

parameter tlaat describes the amilaiity or dissinrilaiity of the waveforms tba* cannot be 
gccoTinted for by ILDs and/or ITDs (e.g., the maxiinum of the (aroaa-cotrelatioa fiaictLon) 
preferably for every time/frequency slot Preleiably, spatial parameters are included for each 
additional sxa^&toTy channel. 

15 Advantages of this parametric description are ifae foUowing: 

- Decoiq)lingofinonaiiralai«ibinatiral signal pflrametexs in aw^oood^^ 

jelated to stweo audio coders are strong^ly reduced (such as the audibility of interauially • 
imcotrelated quantiaation noise compared to interanrally correlated quantization noise). 

- Strong bitrate reduction in audio coders due to a low iq>date rate and low firequemsy 
20 resolution required fbr Ihe spatial parameters. The associated bitrate to code the spatial 

parameters is typically 1 0 kbit/s or less (see embodiment). 

- Easy combination with eotisting audio coders. The proposed scheme produces one mono 
signal that can be coded and decoded with any escisting coding strategy. After monaural 
decodhig, the system described here regenerates the spatial attributes. 

25 The set of spatial parameters can be used as an enhancement layer in audio 

oodeia. For example, a mono signal is transmitted if only a low bitrato is allowed, t^^ile by 
including the spatial enhancement layer the decoder can reproduce stereo sound. 

The invention can in principle be used to generate n ohaonels j&om one mono 
signal, if (w-1) sets of spatial parameteis are transmitted. In such condition, the spatial 

30 parameters describe how to form the n different au^o channels fiom the single mono signal. 



Analvsisjnefliods 

In the following, it is assumed that the incoming signals ate split up in band- 
pass signals (preffer^y with a bandwidth which increases wtfli ftequency) and that 
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An importfuit issue of tamsmisaon of parametets Is (is accuracy of the 
param^repre=«*=tima*.,fl«. to of quartizsd^ 
5 ^sMyttansmisaonc^aclty.talUssectlo^sev^ 

4,:anli^onofttospatfdi«ameteBwmbeatoe4Tliebaafc 

,™,fltolion e«>« on .o^Just-notlcabk ^erince, aND») of Am ?P«lUl To to 

more 8peciflc,fhe quanSzationTOor is detenri« 

,y^tod»mgM toH»ps»««*»«. SiMeltisweUtoownfWtheseffiilivityto ctongesm 
10 ftepa«n«o»*ooglydspMd»onfl«val.^ofthep»ame.«sl^ 
fenowiiignioaioitotodetffliiiineltodisoetev^^ 

liisknovmftomi»y<W>»stlc«aaeai»ibtha<to 
15 irodep»dsoa1toJU)«sd£Ifthcnj3i8e^ressedtodB,d«vMo«srf.^^ 
dB torn a »&re»ce of 0 dB a» de1.=cteM., T»iilo d««g« to 
«&t«ce W difference amomrts 20 dB, T1««fi»«. 

20 )l.«„(co»pr«dve)1ran*™adonoftteobtain=dlevddifferen^ 
o™«|^process,orbyT«ngaloota,pt=blcibrthea™^^ 
,,„^distaT«aon.Tto»ibodinK»tbetowgLv«»e^ 



25 



30 



Tto q«antoto em« Of tto corotofioa depends on (1) fl» 
iteelf aodpoBsibly (2) ontoM. Chelation -ralnesnea+l are coded 
8e,a8n^ quantization^), -We -rrdafionvaluesnearOawcoded^atowac^ 
(ala^qnanazatU.n=tep).AnemnpIeofa.etofnon-lineartydlarlW^ 
is givMintoembodim6nl.Asecondpossibffl1y iatoure quantization steps &r die 

,«,„etedonthatd^onthe»=asnredILD of the ^ subbcmi: fi>rla>SPlLDs (Le.. one 
danuel is dondnant interna of energyXtequantia^cnetatsintocortelato 
tager Aneicttemee^mpteoftoprindplewouldbetonotlxansnutcom.^ 

tos<&bandat!aiif1toabsoMBvd»eoftemtofl»rtsubbmai»beyond»certain 



cectain 
fbfeshold. 



■- 012 22.04.2002 14 

PHNL020356SPP Jj^ ^ 

" f 22.04.2002 



IQ 



^l>^aooDStmtpta«tosboM.™»meansll.atmtennsofdel,yfim^ 

fto^wotore wavefcmns. TOs phmomenon can te 

pa,anet«sis>toa<>ertflinfie»>™<9'(tWi™lly21t^^^ 

Atbird«ea»dofbit^red»ctionfatotaco,po«Jeiro^c^^ 

20 

„ Fig.1 Seh=mrttedtag«mofaaembodtaeM<)ffl«to«tioaIathee,ic^^ 
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Analysis 

The left and right incoming signals are pplit up in various time fiames (2048 
samples at 44.1 kHz sampling rate) and windowed witii a square-root Hgnuing window. 

5 Subsequently, EFTs are computed. The negative FFT ftequencies are discarded and the 
lesultiJig FFTs at© subdivided into grotqps (subbands) of PFT bins. The tiumber of FFT bins 
that are combined in a subband g depends on the frequency: at higher ftequenoies more bins 
are combined tiban at lower fi»quenoies. In ihe ounsnt Implemetttation, FFT bins 
coiiespon<jiiig to proximately 1 .S BRBs (Equivalent Rectangular Bandwidth) are grouped, 

1 0 resulting in 20 subbands to represent the entire audible frequency range. "Die resulting 
number of PPT bins ^[g] of each subsequent subband (starting at the lowest frequency) is 
^[4 4 4 5 6 8 9 12 13 17 21 25 30 38 45 55 €i 82 100 4771 

ThttS» the first three subbands contain 4 FFT bins, the fourth subband oontein^ 
5 FFT bins, etc. For each subband, the corresponding ILD, ITD and correlation (r) are 

15 computed The ITD and correlation are computed simply by setting all FFT bins wHch 
belong to olher gcovgjs to zero, multiplying the resulting (band-limited) FFTs from the left 
and right channels, followed by an inverse PFT transfonn. The resulting cross-correlation 
function is scaimed for a withm an interohannel delay between -64 and +63 samples. 
The iittemal delay corresponding to the peak is used as ITD value, and the value of the otoss- 

20 correlation function at this peak is used as Ibis subband's interaural correlation. Finally, the 
ILD is simply computed by taking the power ratio of the left and ri^ ohannete for each 
subband. 

fi^araAlon of the sum signal 

25 The left and right subbands are summed after a phase correction (temporal 

alignment). This phase correction follows from the computed ITD for that subband and 
consists of delaying the left-ohannsl subband withITD/2 and the right-channel subband with 
-rrD/2. The delay is performed m the frequency domain by appropriate modification of the 
phase angles of each EFT bin. Subsequently, the sum signal is computed by adding Ihe 

30 phase-modified version? of Ihe left and rig?i± subband signals. Finally, to compensate for 
uncorrelated or correlated addition, each subband of the s«m signal is multiplied with 
sqrt(2/(l+r)), with r the correlation of the corresponding subband. If necessary, the sum 
signal can be converted to the time domain by (1) inserting complex conjugates at negative 
frequencies, (2) inverse FFT, (3) windowing, and (4) oveil^)-add. 
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description aflows strong bitrate leduotions in audio coders, since only ope monaural signal 
has to be transmitted, combined with (quantized) parameters which describe ihe spatial 
propetties of the signal. The decoder can form the original amotmt of audio channels by 
applying 1he spatial parameters. For near-CD-quality st^eo audio, abitrato associated vdtii 
tiiese spatial parameters of 10 kbit/s or less seems guf&aent to reproduce the conect spatial 
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CLAIMS.' 



1 A method of coding m audio agnal, the method cojnpiising: 

gener^ng amonaural signal comptising a certain oomhination of at least two 

input audio ohamiBls, 

analyzing spatial parameters of the at least two inpitt au^o channels, 
5 prefe^Uy&reaohtimo/frequei^slot^tcob^asetofspatialparameterspr^^^ 
ev«.ytime/freciuencyslot,ihc.eti3,dx.iingall.asttwolo^^ 
orIPD)a«daparameterti«tde8cribesasimilarityordissimUadiyofw^^^^^ 
be accounted forby^IocaH^onoi,es,fheparameterbeinge.g.am«^ 

coireladon function, and , , , a*u^.^^ 
-geaara^an-encoded-BigDal-compdsing1^e.m^^ 

spatial parameters. 

2 An encoder for coding an audio signal, the encoder comprising: 

s for genexaiing a monanral signal comprising a certain combination of at 



15 least two input andio channels, 

means for analyzing spatial parameters of the at least two input audio 
Channels,prefetablyforea6htime/fteciucnoy3lot.toobtainas 
preferably fox every time/fi.quenoyslot.lhe set indudingatleaattwoloc^ 
IIj5,andrmo.IPD)andapa^oterthatdescribesasimilarityordissi^ 
20 Wotmathatcannotbeaocountedforby^looalizationcues,1heparame^ 

maximmnofaccoss-correlatLon function, and , , , . 

5 for generating an encoded signal comprising the uwnoanral signal and 



the set of spatial paramatera. 
2s 3. An^ai«tusferstq^lyinganaudioaignal.theapparal«scompiising: 
an inpJt for receiving an audio signal, 

an encoder as daimed in claim2foreru«ding1he andio signaltoobtai^ 

encoded audio signal, and 

an output for supplying the encoded audio signid. 
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4. An encoded audio signal, the signal comprising: 

a monaural signal comprising a certain combination of at least two audio 

cfaannsls, and 

a set of spatial parameters, preferably for every time/firequency slot, the set 
including at least two localization cues (e.g. ILD. and ITD orlPD) and a parametet that 
describes a similarity or dissimilarity of waveforms Uiat cannot be aocoimlBd for by the 
locaHzation cues, the parameter being e.g. a majdmum of a cfOSS«correIation function. 

5. A storage medium on wMoh an encoded signal as claimed In ddm 4 has been 



6. A method of decoding an weeded audio signal, fte method comprising: 

oblainlng a monaural signal ftom the encoded audio signal, the monaural 

signal comprising a certain combinadon of at leasttwo audio channels, and 

obtaining a set of spatial parameters fiomlh© encoded audio signal, preferably 

for every time/&eqnenoy slot, the set including at leasttwo localization cues (e.g. ILD, and . . 

ITD or IPD) and aparameter that describes a similarity or dissimilarity of waveforms that .. 

cannot be accounted for by the localization cues, the parameter being e.g. a maJdmum of a 

ctos^-cocrelatlonl^mction, and 

applying the spatial parameters to the monaural signal or ttie at least two audio 

channels to generate amulU-channel output si^al, 

7^ A decoder for decoding an encoded andlo signal 

means for obtaining a monaural signal fiom the encoded audio signal, flis 
monaural signal comprising a certain combination of at least two audio channels, and 

means for obtaining a set of spatial parameters ftom tiie encoded audio signal, 
preferably for every time/firequency slot, die set including at least two locali^tion cues (e.g. 
ILD, and rro or EPD) and a parameter that describes a similarity or dissimilarity of 
waveforms liiat cannot be accounted for by the localization cues, the parameter being e.g. a 
maximum of a cross-correlation function, and 

means for applying the spatial parameters to the monaural signal oilhe at least 
two audio channels to generate a multi-channel ontpnt signal. 



^ 13.^ mxuxro oxH i-o- 018 ■ "22 .04 .2002 14:30 

FB3SnL020356EPP ,s|^ |A 

" ^2 22.04.2002 

8 Ma^paratusforsupplyi«gadeooded«udiosigiial,theap^ 

attinpntforreceivinganeacodedaodiosignal, ^ 
adeooder a. daimedinolaim7forde<«,dli.gihe encoded audio 
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ABSTRACT: 



In smmnaiy, this application describes a psycho-acouslically motivated, 
paiametiio description of liie spadal attributes of multichamiel audio signals. THs parametric 
dfisraiplion allows strong bitrate reductions in audio coders, since only one monawal signal 
has to be tiansmitted, combined wMi (quantiizB^) parameters which describe the spatial 
5 propeartles of the signal. The decoder can fbrm tiie ori^ amount of audio channels by 
applying the spatial parameters. For near-CD-quaUty stereo audio, a bitrate associated wiA 
these spatial parameters of 10 kbit/s or less seems suffideot to repradwoe flie correct spatial 
impression 9t the receiving end. 
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